A clustering approach for translationese identification
نویسندگان
چکیده
Our paper is concerned with investigating the impact of translationese on the novels of a bilingual writer and asking whether one could determine the authorship of a translated document. The main part of our paper will be centered on selecting a good set of lexical features that can be considered characteristic for an author. We used in our research the novels of Vladimir Nabokov, a bilingual author, who wrote his works in both Russian and English. Each text is represented by a vector of function words. We are interested in determining how the results vary across different feature sets and which feature set could be considered the most representative. In order to inspect our results we used a hierarchical clustering method and draw conclusions based on the most frequent result.
منابع مشابه
A Parallel Corpus of Translationese
We describe a set of bilingual English–French and English–German parallel corpora in which the direction of translation is accurately and reliably annotated. The corpora are diverse, consisting of parliamentary proceedings, literary works, transcriptions of TED talks and political commentary. They will be instrumental for research of translationese and its applications to (human and machine) tr...
متن کاملIdentification of Translationese: A Machine Learning Approach
This paper presents a machine learning approach to the study of translationese. The goal is to train a computer system to distinguish between translated and non-translated text, in order to determine the characteristic features that influence the classifiers. Several algorithms reach up to 97.62% success rate on a technical dataset. Moreover, the SVM classifier consistently reports a statistica...
متن کاملTranslationese: Between Human and Machine Translation
Translated texts, in any language, have unique characteristics that set them apart from texts originally written in the same language. Translation Studies is a research field that focuses on investigating these characteristics. Until recently, research in machine translation (MT) has been entirely divorced from translation studies. The main goal of this tutorial is to introduce some of the find...
متن کاملTranslationese Traits in Romanian Newspapers: A Machine Learning Approach
This paper presents a machine learning approach to the investigation of the translationese effect on Romanian newspapers texts. The aim is to train a learning system to distinguish between translated and non-translated texts. The classifiers achieve an accuracy well above the chance level, the results confirming the existence of translationese manifestation. Also, the experiments investigate wh...
متن کاملA New Approach to the Study of Translationese: Machine-learning the Difference between Original and Translated Text
In this paper we describe an approach to the identification of “translationese” based on monolingual comparable corpora and machine learning techniques for text categorization. The paper reports on experiments in which support vector machines (SVMs) are employed to recognize translated text in a corpus of Italian articles from the geopolitical domain. An ensemble of SVMs reaches 86.7% accuracy ...
متن کامل